class: center, middle, inverse, title-slide .title[ # Measures of Variability ] .subtitle[ ## EDP 613 ] .author[ ### Week 4 ] --- <script> function resizeIframe(obj) { obj.style.height = obj.contentWindow.document.body.scrollHeight + 'px'; } </script>
# Before we Begin Remember that a statistic is **resistant** if its value is not affected by extreme values (large or small) in the data set. So -- **Q**: Which of the measures of central tendency are resistant? -- **A**: Since the <span style="color:#5bc0de; font-style: italic;">median</span> is simply the middle value, it is not affected by outliers and <span style="color:#5bc0de">is</span> therefore <span style="color:#5bc0de">resistant</span>. --- # Basic Idea *Variability* basically tells us how far apart data points lie from each other and from the center of a distribution --- # Why? <span class="center">Generally</span> <br> -- .pull-left[ The ***central tendency*** tells us where most of our points lie ] -- .pull-right[ The ***variability*** summarizes how far apart the points are ] --- # What Does it Tell Us? <img src="Slides-Week-4_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- # Measures of Variability -- <br> <br> <br> .pull-left[ <p id="center" style="color:#f0b5d3; font-weight: bold; border:1px; border-style:solid; border-color:#f0b5d3; border-radius: 25px; padding: 0.3em;"> Range </p> ] -- .pull-right[ <p id="center" style="color:#f5ebd9; font-weight: bold; border:1px; border-style:solid; border-color:#f5ebd9; border-radius: 25px; padding: 0.3em;"> Interquartile range </p> ] -- <br> <br> <br> .pull-left[ <p id="center" style="color:#99d2dd; font-weight: bold; border:1px; border-style:solid; border-color:#99d2dd; border-radius: 25px; padding: 0.3em;"> Standard deviation </p> ] -- .pull-right[ <p id="center" style="color:#e1e1f9; font-weight: bold; border:1px; border-style:solid; border-color:#e1e1f9; border-radius: 25px; padding: 0.3em;"> Variance </p> ] --- ## The <span style="color:#f0b5d3; font-weight: bold;">Range</span> <div class="center"> <p> The <span style="color:#f0b5d3; font-weight: bold; font-style:italic;">range</span> of a data set is the difference between the largest value (Max) and the smallest value (Min)<br> \[\text{range} = \text{Max} − \text{Min}\] </p> </div> --- ### Example Compute the <span style="color:#f0b5d3; font-weight: bold;">range</span> for the <b>sample</b> of people <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 7 </td> </tr> </tbody> </table> -- <br> ---- <br> While not necessary, putting the data set in numerical order reduces the likelihood of making a silly mistake <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 7 </td> </tr> </tbody> </table> --- ### Steps We have `\(\,\text{Max}=7\,\)` and `\(\,\text{Min}=1\,\)` so `$$7-1=6$$` or in context <b>6 people</b> --- ### Example Compute the <span style="color:#f0b5d3; font-weight: bold;">range</span> for the <b>sample</b> $3.61, $3.84, $3.79, $3.61, $4.09, and $3.96. -- <br> ---- <br> First for simplicity, we arrange the data set in numerical order <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:center;"> 3.61 </td> <td style="text-align:center;"> 3.61 </td> <td style="text-align:center;"> 3.79 </td> <td style="text-align:center;"> 3.84 </td> <td style="text-align:center;"> 3.96 </td> <td style="text-align:center;"> 4.09 </td> </tr> </tbody> </table> --- ### Steps `$$4.09-3.61=0.48$$` <br> or in context <b>$0.48</b> --- ## The <span style="color:#f5ebd9; font-weight: bold;">interquartile range</span> Every data set has three quartiles >- `\(Q_1\)` >>- first quartile >>- 25th percentile >>- separates the lower 25% of the data from the higher 75% -- >- `\(Q_2\)` >>- second quartile >>- 50th percentile >>- separates the lower 50% of the data from the higher 50%% >>- aka the *median* -- >- `\(Q_3\)` >>- third quartile >>- 75th percentile >>- separates the lower 75% of the data from the higher 25% --- <div class="center"> <p> The <span style="color:#f5ebd9; font-style: italic;">interquartile range</span> (IQR) is found by subtracting the first quartile from the third quartile<br> \[\text{IQR} = Q_3 − Q_1\] </p> </div> --- ### Outliers An *outlier* is a value that is considerably larger or smaller than most of the values in a data set --- ### Finding Outliers: IRQ Method 1. Find the `\(\text{Min}\)` and `\(\text{Max}\)` -- 2. Find `\(Q_1\)`, `\(Q_2\)`, and `\(Q_3\)` -- 3. Compute the `\(\text{IQR}\)` -- 4. Compute the cutoff points for determining outliers - aka *outlier boundaries* -- .pull-left[ > Lower Outlier Boundary (LOB) <br><br> `\(Q_1 − 1.5 \cdot\text{IQR}\)` ] -- .pull-right[ > Upper Outlier Boundary (UOB) <br><br> `\(Q_3 + 1.5 \cdot\text{IQR}\)` ] -- <br> `\(\,\,\,\,\,\,\)`<span>5.</span> Any data point -- .pull-left[ > Less than the LOB <br> is an outlier ] -- .pull-right[ > Greater than the UOB <br> is an outlier ] --- ### Example Over the span of 35 days, Jamie drives to work every weekday morning and keeps track of her time (in minutes) for some reason <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:center;"> 15 </td> <td style="text-align:center;"> 17 </td> <td style="text-align:center;"> 17 </td> <td style="text-align:center;"> 17 </td> <td style="text-align:center;"> 17 </td> <td style="text-align:center;"> 18 </td> <td style="text-align:center;"> 19 </td> </tr> <tr> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> 20 </td> <td style="text-align:center;"> 20 </td> </tr> <tr> <td style="text-align:center;"> 20 </td> <td style="text-align:center;"> 20 </td> <td style="text-align:center;"> 20 </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 21 </td> </tr> <tr> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 22 </td> <td style="text-align:center;"> 22 </td> <td style="text-align:center;"> 22 </td> <td style="text-align:center;"> 23 </td> </tr> <tr> <td style="text-align:center;"> 23 </td> <td style="text-align:center;"> 24 </td> <td style="text-align:center;"> 26 </td> <td style="text-align:center;"> 31 </td> <td style="text-align:center;"> 36 </td> <td style="text-align:center;"> 38 </td> <td style="text-align:center;"> 39 </td> </tr> </tbody> </table> Construct a boxplot --- ### Steps 1. We have - `\(\text{Max}\)`: <b>15 minutes</b> -- - `\(\text{Min}\)`: <b>39 minutes</b> --- <span>2.</span> To find the position of `\(Q_1\)`, we have `\begin{align} \dfrac{25}{100}\cdot 35 &= 0.25\cdot 35\\\\ &=8.57\\\\ &\approx 9 \end{align}` -- `\(\,\,\,\,\,\,\)`which tells to look for the <span style="color:#f37735;">data point in the 9th position</span> <table style="color: #65737e; width: auto !important; margin-left: auto; margin-right: auto;" class="table"> <tbody> <tr> <td style="text-align:center;"> 15 </td> <td style="text-align:center;"> <span style=" color: !important;">17</span> </td> <td style="text-align:center;"> 17 </td> <td style="text-align:center;"> 17 </td> <td style="text-align:center;"> 17 </td> <td style="text-align:center;"> 18 </td> <td style="text-align:center;"> 19 </td> </tr> <tr> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> <span style=" color: #f37735 !important;">19</span> </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> 20 </td> <td style="text-align:center;"> 20 </td> </tr> <tr> <td style="text-align:center;"> 20 </td> <td style="text-align:center;"> <span style=" color: !important;">20</span> </td> <td style="text-align:center;"> 20 </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 21 </td> </tr> <tr> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> <span style=" color: !important;">21</span> </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 22 </td> <td style="text-align:center;"> 22 </td> <td style="text-align:center;"> 22 </td> <td style="text-align:center;"> 23 </td> </tr> <tr> <td style="text-align:center;"> 23 </td> <td style="text-align:center;"> <span style=" color: !important;">24</span> </td> <td style="text-align:center;"> 26 </td> <td style="text-align:center;"> 31 </td> <td style="text-align:center;"> 36 </td> <td style="text-align:center;"> 38 </td> <td style="text-align:center;"> 39 </td> </tr> </tbody> </table> `\(\,\,\,\,\,\,\)`or in context <b>19 minutes<b> --- To find the position of `\(Q_2\)`, we have `\begin{align} \dfrac{50}{100}\cdot 35 &= 0.50\cdot 35\\\\ &=17.50\\\\ &\approx 18 \end{align}` `\(\,\,\,\,\,\,\)`which tells to look for the <span style="color:#f37735;">data point in the 18th position</span> <table style="color: #65737e; width: auto !important; margin-left: auto; margin-right: auto;" class="table"> <tbody> <tr> <td style="text-align:center;"> 15 </td> <td style="text-align:center;"> 17 </td> <td style="text-align:center;"> 17 </td> <td style="text-align:center;"> <span style=" color: !important;">17</span> </td> <td style="text-align:center;"> 17 </td> <td style="text-align:center;"> 18 </td> <td style="text-align:center;"> 19 </td> </tr> <tr> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> <span style=" color: !important;">19</span> </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> 20 </td> <td style="text-align:center;"> 20 </td> </tr> <tr> <td style="text-align:center;"> 20 </td> <td style="text-align:center;"> 20 </td> <td style="text-align:center;"> 20 </td> <td style="text-align:center;"> <span style=" color: #f37735 !important;">21</span> </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 21 </td> </tr> <tr> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> <span style=" color: !important;">22</span> </td> <td style="text-align:center;"> 22 </td> <td style="text-align:center;"> 22 </td> <td style="text-align:center;"> 23 </td> </tr> <tr> <td style="text-align:center;"> 23 </td> <td style="text-align:center;"> 24 </td> <td style="text-align:center;"> 26 </td> <td style="text-align:center;"> <span style=" color: !important;">31</span> </td> <td style="text-align:center;"> 36 </td> <td style="text-align:center;"> 38 </td> <td style="text-align:center;"> 39 </td> </tr> </tbody> </table> `\(\,\,\,\,\,\,\)`or in context the *median* is <b>21 minutes</b> --- To find the position of `\(Q_3\)`, we have `\begin{align} \dfrac{75}{100}\cdot 35 &= 0.75\cdot 35\\\\ &=26.25\\\\ &\approx 26 \end{align}` `\(\,\,\,\,\,\,\)`which tells to look for the <span style="color:#f37735;">data point in the 26th position</span> <table style="color: #65737e; width: auto !important; margin-left: auto; margin-right: auto;" class="table"> <tbody> <tr> <td style="text-align:center;"> 15 </td> <td style="text-align:center;"> 17 </td> <td style="text-align:center;"> 17 </td> <td style="text-align:center;"> 17 </td> <td style="text-align:center;"> <span style=" color: !important;">17</span> </td> <td style="text-align:center;"> 18 </td> <td style="text-align:center;"> 19 </td> </tr> <tr> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> <span style=" color: !important;">19</span> </td> <td style="text-align:center;"> 20 </td> <td style="text-align:center;"> 20 </td> </tr> <tr> <td style="text-align:center;"> 20 </td> <td style="text-align:center;"> 20 </td> <td style="text-align:center;"> 20 </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> <span style=" color: !important;">21</span> </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 21 </td> </tr> <tr> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 22 </td> <td style="text-align:center;"> <span style=" color: #f37735 !important;">22</span> </td> <td style="text-align:center;"> 22 </td> <td style="text-align:center;"> 23 </td> </tr> <tr> <td style="text-align:center;"> 23 </td> <td style="text-align:center;"> 24 </td> <td style="text-align:center;"> 26 </td> <td style="text-align:center;"> 31 </td> <td style="text-align:center;"> <span style=" color: !important;">36</span> </td> <td style="text-align:center;"> 38 </td> <td style="text-align:center;"> 39 </td> </tr> </tbody> </table> `\(\,\,\,\,\,\,\)`or in context <b>22 minutes</b> --- <span>3.</span> To find the range between quartiles, we have `\begin{align} \text{IQR} &= 22-19\\ &=3 \end{align}` `\(\,\,\,\,\,\,\)`or in context <b>3 minutes</b> --- <span>4.</span> To find the boundaries, we have <br> `\begin{align} \text{LOB} &= 19-1.5\cdot3\\ &=19-4.5\\ &=14.5 \end{align}` `\begin{align} \text{UOB} &= 22+1.5\cdot3\\ &=22+3\\ &=26.5 \end{align}` <br> `\(\,\,\,\,\,\,\)`giving us <b>14.5</b> and <b>26.5 minutes</b>, respectively --- ## Five-number summary .pull-left[ <br> <br> <br> <br> <br> <br> <p id="center"> Report on </p> ] -- .pull-right[ > `$$\text{Min}$$` <br> > `$$Q_1$$` <br> > `$$Q_2$$` <br> > `$$Q_3$$` <br> > `$$\text{Max}$$` ] --- <img src="Slides-Week-4_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- ### Example Following are the number of grams of carbohydrates in 12-ounce espresso beverages offered at Starbucks <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> 43 </td> <td style="text-align:center;"> 38 </td> <td style="text-align:center;"> 44 </td> <td style="text-align:center;"> 31 </td> <td style="text-align:center;"> 27 </td> <td style="text-align:center;"> 39 </td> <td style="text-align:center;"> 59 </td> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 10 </td> <td style="text-align:center;"> 54 </td> </tr> <tr> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> 25 </td> <td style="text-align:center;"> 26 </td> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 46 </td> <td style="text-align:center;"> 30 </td> <td style="text-align:center;"> 24 </td> <td style="text-align:center;"> 41 </td> <td style="text-align:center;"> 26 </td> <td style="text-align:center;"> 27 </td> <td style="text-align:center;"> 14 </td> </tr> </tbody> </table> -- <br> ---- <br> First we will benefit from reordering the data set <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 10 </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> 24 </td> <td style="text-align:center;"> 25 </td> <td style="text-align:center;"> 26 </td> <td style="text-align:center;"> 26 </td> <td style="text-align:center;"> 27 </td> <td style="text-align:center;"> 27 </td> <td style="text-align:center;"> 30 </td> <td style="text-align:center;"> 31 </td> <td style="text-align:center;"> 38 </td> <td style="text-align:center;"> 39 </td> <td style="text-align:center;"> 41 </td> <td style="text-align:center;"> 43 </td> <td style="text-align:center;"> 44 </td> <td style="text-align:center;"> 46 </td> <td style="text-align:center;"> 54 </td> <td style="text-align:center;"> 59 </td> </tr> </tbody> </table> --- ### Steps 1. We have - `\(\text{Min}\)`: <b>9 grams</b> - `\(\text{Max}\)`: <b>59 grams</b> --- <span>2.</span> To find the position of `\(Q_1\)`, we have `\begin{align} \dfrac{25}{100}\cdot 22 &= 0.25\cdot 22\\ &=5.50\\ &\approx 6 \end{align}` `\(\,\,\,\,\,\,\)`which tells to look for the <span style="color:#f37735;">data point in the 6th position</span> <table style="color: #65737e; width: auto !important; margin-left: auto; margin-right: auto;" class="table"> <tbody> <tr> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 10 </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> <span style=" color: #f37735 !important;">14</span> </td> <td style="text-align:center;"> 24 </td> <td style="text-align:center;"> 25 </td> <td style="text-align:center;"> 26 </td> <td style="text-align:center;"> 26 </td> <td style="text-align:center;"> 27 </td> <td style="text-align:center;"> 27 </td> <td style="text-align:center;"> 30 </td> <td style="text-align:center;"> 31 </td> <td style="text-align:center;"> 38 </td> <td style="text-align:center;"> 39 </td> <td style="text-align:center;"> 41 </td> <td style="text-align:center;"> 43 </td> <td style="text-align:center;"> 44 </td> <td style="text-align:center;"> 46 </td> <td style="text-align:center;"> 54 </td> <td style="text-align:center;"> 59 </td> </tr> </tbody> </table> `\(\,\,\,\,\,\,\)`or in context <b>14 grams</b> --- To find the position of `\(Q_2\)`, we have `\begin{align} \dfrac{50}{100}\cdot 22 &= 0.50\cdot 22\\ &=11 \end{align}` `\(\,\,\)`which tells to look for the <span style="color:#f37735;">data point in the 11th position</span> <table style="color: #65737e; width: auto !important; margin-left: auto; margin-right: auto;" class="table"> <tbody> <tr> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 10 </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> 24 </td> <td style="text-align:center;"> 25 </td> <td style="text-align:center;"> 26 </td> <td style="text-align:center;"> 26 </td> <td style="text-align:center;"> <span style=" color: #f37735 !important;">27</span> </td> <td style="text-align:center;"> 27 </td> <td style="text-align:center;"> 30 </td> <td style="text-align:center;"> 31 </td> <td style="text-align:center;"> 38 </td> <td style="text-align:center;"> 39 </td> <td style="text-align:center;"> 41 </td> <td style="text-align:center;"> 43 </td> <td style="text-align:center;"> 44 </td> <td style="text-align:center;"> 46 </td> <td style="text-align:center;"> 54 </td> <td style="text-align:center;"> 59 </td> </tr> </tbody> </table> `\(\,\,\)`or in context the *median* is <b>27 grams</b> --- To find the position of `\(Q_3\)`, we have `\begin{align} \dfrac{75}{100}\cdot 22 &= 0.75\cdot 22\\ &=16.50\\ &\approx 17 \end{align}` `\(\,\,\)`which tells to look for the <span style="color:#f37735;">data point in the 17th position</span> <table style="color: #65737e; width: auto !important; margin-left: auto; margin-right: auto;" class="table"> <tbody> <tr> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 10 </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> 24 </td> <td style="text-align:center;"> 25 </td> <td style="text-align:center;"> 26 </td> <td style="text-align:center;"> 26 </td> <td style="text-align:center;"> 27 </td> <td style="text-align:center;"> 27 </td> <td style="text-align:center;"> 30 </td> <td style="text-align:center;"> 31 </td> <td style="text-align:center;"> 38 </td> <td style="text-align:center;"> 39 </td> <td style="text-align:center;"> <span style=" color: #f37735 !important;">41</span> </td> <td style="text-align:center;"> 43 </td> <td style="text-align:center;"> 44 </td> <td style="text-align:center;"> 46 </td> <td style="text-align:center;"> 54 </td> <td style="text-align:center;"> 59 </td> </tr> </tbody> </table> `\(\,\,\)`or in context <b>41 grams</b> --- <span>3.</span> To find the range between quartiles, we have `\begin{align} \text{IQR} &= 41-14\\ &=27 \end{align}` `\(\,\,\,\,\,\,\)`or in context <b>27 grams</b> --- <span>4.</span> To find the boundaries, we have <br> `\begin{align} \text{LOB} &= 14-1.5\cdot27\\ &=14-40.5\\ &=-26.5 \end{align}` `\begin{align} \text{UOB} &= 41+1.5\cdot27\\ &=41+40.5\\ &=81.5 \end{align}` <br> `\(\,\,\,\,\,\,\)`giving us <b>-26.5</b> and <b>81.5 grams</b>, respectively .footnote[Realistically this is between 0 and 81.5 grams unless you can make a good argument that coffee can have negative grams of carbohydrates] --- <img src="Slides-Week-4_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> --- ## The <span style="color:#99d2dd; font-weight: bold;">standard deviation</span> In a nutshell, a <span style="color:#99d2dd; font-style: italic;">standard deviation</span> is just a number we use to tell how measurements for a group of things are spread out from the average which in our case is the mean .pull-left[ <p id="center"> <b>Population</b> <p> `$$\sigma=\sqrt{\dfrac{\sum\left(Y-\overline{Y}\right)^2}{\color{VioletRed}{N}}}$$` ] -- .pull-right[ <p id="center"> <b>Sample</b> </p> `$$s=\sqrt{\dfrac{\sum\left(Y-\overline{Y}\right)^2}{\color{VioletRed}{n-1}}}$$` ] -- <p id="center"> `Y` is a data point<br><br> `\overline{Y}` is the mean </p> -- .pull-left[ <p id="center">`N` is the <b>population</b> <span style="color:#c2f2d0;">size</span><br><br> `\sigma` is the <b>population</b> <span style="color:#99d2dd; font-weight: bold;">standard deviation</span></p> ] .pull-right[ <p id="center">`n` is the <b>sample</b> <span style="color:#c2f2d0;">size</span><br><br> `s` is the <b>sample</b> <span style="color:#99d2dd; font-weight: bold;">standard deviation</span></p> ] -- .footnote[If you want to know why we divide by `n-1` in a sample standard deviation, that is a pretty interesting topic and you can explore more about that over at [Khan Academy](https://www.khanacademy.org/math/ap-statistics/summarizing-quantitative-data-ap/more-standard-deviation/v/review-and-intuition-why-we-divide-by-n-1-for-the-unbiased-sample-variance)] --- ### What Do These Look Like? <img src="Slides-Week-4_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> --- ### Example Calculate the <b>sample</b> <span style="color:#99d2dd; ; font-weight: bold;">standard deviation</span> of the following set of data points by hand <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:center;"> 46 </td> <td style="text-align:center;"> 69 </td> <td style="text-align:center;"> 32 </td> <td style="text-align:center;"> 60 </td> <td style="text-align:center;"> 52 </td> <td style="text-align:center;"> 41 </td> </tr> </tbody> </table> -- <br> ---- <br> Again, putting the data set in numerical order can make it easier to track <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:center;"> 32 </td> <td style="text-align:center;"> 41 </td> <td style="text-align:center;"> 46 </td> <td style="text-align:center;"> 52 </td> <td style="text-align:center;"> 60 </td> <td style="text-align:center;"> 69 </td> </tr> </tbody> </table> --- ### Steps 1. Compute the mean `\begin{align} \overline{Y} &= \dfrac{32+41+46+52+60+69}{6}\\\\ &=\frac{300}{6}\\\\ &=50 \end{align}` --- <span>2.</span> Compute the deviations and square them <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> `Y` </th> <th style="text-align:center;"> `Y-\overline{Y}` </th> <th style="text-align:center;"> `(Y-\overline{Y})^2` </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;width: 10em; "> 32 </td> <td style="text-align:center;width: 10em; "> -18 </td> <td style="text-align:center;width: 10em; "> 324 </td> </tr> <tr> <td style="text-align:center;width: 10em; "> 41 </td> <td style="text-align:center;width: 10em; "> -9 </td> <td style="text-align:center;width: 10em; "> 81 </td> </tr> <tr> <td style="text-align:center;width: 10em; "> 46 </td> <td style="text-align:center;width: 10em; "> -4 </td> <td style="text-align:center;width: 10em; "> 16 </td> </tr> <tr> <td style="text-align:center;width: 10em; "> 52 </td> <td style="text-align:center;width: 10em; "> 2 </td> <td style="text-align:center;width: 10em; "> 4 </td> </tr> <tr> <td style="text-align:center;width: 10em; "> 60 </td> <td style="text-align:center;width: 10em; "> 10 </td> <td style="text-align:center;width: 10em; "> 100 </td> </tr> <tr> <td style="text-align:center;width: 10em; "> 69 </td> <td style="text-align:center;width: 10em; "> 19 </td> <td style="text-align:center;width: 10em; "> 361 </td> </tr> </tbody> </table> --- <span>3.</span> Calculate the sum of (the) squares `\begin{align} \left(Y-\overline{Y}\right)^2 &= 324+81+16+4+100+361\\\\ &=886 \end{align}` --- <span>4.</span> Divide by size `\begin{align} \dfrac{886}{6-1} &= \dfrac{886}{5}\\\\ &=177.2 \end{align}` --- <span>5.</span> Take the square root <br> `$$\sqrt{177.2}\approx 13.31$$` <br> `\(\,\,\,\,\,\,\)`implying that ***each data point deviates from the mean by 13.31 points on average*** --- ## The <span style="color:#e1e1f9; font-weight: bold;">Variance</span> In a nutshell, a <span style="color:#e1e1f9; font-style: italic">variance</span> is just a number we use to tell how measurements for a group of things are spread out from the average which in our case is the mean and the measure is always positive .pull-left[ <p id="center"> <b>Population</b> <p> `$$\sigma^2=\dfrac{\sum\left(Y-\overline{Y}\right)^2}{\color{VioletRed}{N}}$$` ] -- .pull-right[ <p id="center"> <b>Sample</b> </p> `$$s^2=\dfrac{\sum\left(Y-\overline{Y}\right)^2}{\color{VioletRed}{n-1}}$$` ] -- <p id="center"> `Y` is a data point<br><br> `\overline{Y}` is the mean </p> -- .pull-left[ <p id="center">`N` is the <b>population</b> <span style="color:#c2f2d0;">size</span><br><br> `\sigma` is the <b>population</b> <span style="color:#e1e1f9; font-weight: bold;">variance</span></p> ] .pull-right[ <p id="center">`n` is the <b>sample</b> <span style="color:#c2f2d0;">size</span><br><br> `s` is the <b>sample</b> <span style="color:#e1e1f9; font-weight: bold;">variance</span></p> ] --- ### Example Calculate the **variance** of the following set of data points by hand <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:center;"> 46 </td> <td style="text-align:center;"> 69 </td> <td style="text-align:center;"> 32 </td> <td style="text-align:center;"> 60 </td> <td style="text-align:center;"> 52 </td> <td style="text-align:center;"> 41 </td> </tr> </tbody> </table> --- We actually already calculated this! Let's go back to step 4 -- <span>4.</span> Divide by size `\begin{align} \dfrac{886}{6-1} &= \dfrac{886}{5}\\\\ &=177.2 \end{align}` `\(\,\,\,\,\,\,\)`This is actually the <b>sample</b> <span style="color:#e1e1f9; font-weight: bold;">variance</span> --- # Joined at the Hip <br> <span class="center">The <span style="color:#99d2dd; ; font-weight: bold;">standard deviation</span> is just the square root of the <span style="color:#e1e1f9; font-weight: bold;">variance</span></span> -- <br> <br> <span class="center">or equivalently</span> <br> <br> -- <span class="center">the <span style="color:#e1e1f9; font-weight: bold;">variance</span> is just the square of the <span style="color:#99d2dd; ; font-weight: bold;">standard deviation</span></span> <br> <br> -- <span class="center"> so</span> <br> <br> -- <span class="center"> you can't have one without the other</span> --- ## That's it. Let's take a break before working in R.